A Novel Approach for Handling Imbalanced Data in Medical Diagnosis using Undersampling Technique

نویسندگان

  • Varsha Babar
  • Roshani Ade
چکیده

In many data mining applications the imbalanced learning problem is becoming ubiquitous nowadays. When the data sets have an unequal distribution of samples among classes, then these data sets are known as imbalanced data sets. When such highly imbalanced data sets are given to any classifier, then classifier may misclassify the rare samples from the minority class. To deal with such type of imbalance, several undersampling as well as oversampling methods were proposed. Many undersampling techniques do not consider distribution of information among the classes, similarly some oversampling techniques lead to the overfitting or may cause overgeneralization problem. This paper proposes an MLPbased undersampling technique (MLPUS) which will preserve the distribution of information while doing undersampling. This technique uses stochastic measure evaluation for identifying important samples from the majority as well as minority samples. Experiments are performed on 5 real world data sets for the evaluation of performance of proposed work. General Terms Machine Learning, Classification.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Application of Oversampling, Undersampling, Bagging and Boosting in Handling Imbalanced Datasets

Most classifiers work well when the class distribution in the response variable of the dataset is well balanced. Problems arise when the dataset is imbalanced. This paper applied four methods: Oversampling, Undersampling, Bagging and Boosting in handling imbalanced datasets. The cardiac surgery dataset has a binary response variable (1=Died, 0=Alive). The sample size is 4976 cases with 4.2% (Di...

متن کامل

ClusterOSS: a new undersampling method for imbalanced learning

A dataset is said to be imbalanced when its classes are disproportionately represented in terms of the number of instances they contain. This problem is common in applications such as medical diagnosis of rare diseases, detection of fraudulent calls, signature recognition. In this paper we propose an alternative method for imbalanced learning, which balances the dataset using an undersampling s...

متن کامل

Imbalanced Multiclass Data Classification Using Ant Colony Optimization Algorithm

Class imbalance problems have drawn increasing interest lately because of its classification trouble caused by imbalanced class deliveries and poor prediction performance for minority class. This problem is particularly common in preparation and can be detected in various disciplines including fraud detection, anomaly detection, oil spillage detection, medical diagnosis, facial recognition. Man...

متن کامل

Improvement of Chemical Named Entity Recognition through Sentence-based Random Under-sampling and Classifier Combination

Chemical Named Entity Recognition (NER) is the basic step for consequent information extraction tasks such as named entity resolution, drug-drug interaction discovery, extraction of the names of the molecules and their properties. Improvement in the performance of such systems may affects the quality of the subsequent tasks. Chemical text from which data for named entity recognition is extracte...

متن کامل

A Novel Intelligent Fault Diagnosis Approach for Critical Rotating Machinery in the Time-frequency Domain

The rotating machinery is a common class of machinery in the industry. The root cause of faults in the rotating machinery is often faulty rolling element bearings. This paper presents a novel technique using artificial neural network learning for automated diagnosis of localized faults in rolling element bearings. The inputs of this technique are a number of features (harmmean and median), whic...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016